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Abstract 

Some item response theory (IRT) techniques "work" in applications even though the usual 
structural IRT assumptions, and local independence (LI) in particular, do not hold. When the 
departure from local independence is too great, traditional procedures will break down. Although 
violations of strictly unidimensional, montone, locally independent latent structure can sometimes 
be modeled and exploited, many situations call for a unidimensional approach that is tolerant of 
minor violations of strict unidimensionality (e.g., Drasgowand Parsons, 1983; Spray and Ackerman, 
1987; Reckase 1990). Departures from strict unidimensionality can be detected, and the influence of 
these departures on a variety of Li-based ability estimators can be measured. A convenient universe 
of models near the LI model in which to investigate structural robustness issues is provided by Stout 
( 1990b)'s essential unidimensionality modeling approach. In this paper we survey theoretical results 
underpinning this approach, and report on work in progress to apply these results in practical 
settings. 

Keywords: Essential independence, wrong-model analycis, local dependence, maximum likelihood, 
information, posterior ability distributions, modeUfit indices. 
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1 Introduction 

Traditional uruu^nensional monotonic Item Response Theory (IRT) provides a useful but overly 
simple model of examinees' responses to standardized test questions* For example, Drasgow and 
Parsons (1983) assess the shortcomings of the Li-based unidimensionality approach to IRT as 
follows: 

One way in which most current item response theories (IRTs) are surely incorrect is 
in their assumption of a unidimensional latent trait space . . . (I]t seems clear that 
researchers should be more concerned with the robustness of estimation techniques to 
minor violations of dimensionality assumptions than with the possibly never-ending task 
of measuring all latent variables that underlie responses in a particular content domain. 

We are compelled to understand this structural robustness question because it is central to 
current IRT practice. It is widely accepted that the traditional IRT models do not exactly reflect 
the item response process; yet because the traditional inference procedures (in the form of computer 
programs such as LOGIST and BILOG) are so accessible, traditional IRT is applied to item response 
data anyway. Can we trust the inferences from these misspecified models? 

Although violations of strictly unidimensional, di = 1, structure — he,, models satisfying the 
stronger traditional assumptions of local independence (LI) and monotonicity (M) — can sometimes 
be modeled and exploited (whether by introducing new "dependence" parameters as in Jannarone 
(1986) and Gibbons, Bock and Hedeker (1989), or by an explicitly multidimensional approach 
as that of Reckase (1990)), many situations call for a unidimensional approach that is tolerant of 
minor violations of strict unidimensionality (e.g., Drasgow and Parsons, 1983; Spray and Ackerman, 
1987; Reckase 1990). It is also important to note that an "acceptable" level of departure from strict 
unidimensionality may depend on the particular application; for example, ability rank estimates 
on a particular section of the Graduate Record Examination may be more tolerant to violations of 
unidimensionality than detailed item analysis of the same items. 

Departures from strict unidimensionality can be detected, and the influence of these departures 
on a variety of Li-based ability estimators can be measured. In this paper we survey theoretical 
results underpinning this approach, and report on work in progress to apply these results in practical 
settings. A convenient universe of models near the LI model in which to investigate structural 
robustness issues is provided by Stout (1990b)*s essential unidimensionality, d£ = 1, modeling 
approach. The main ideas of essential independence, summarized in Section 2, are due to Stout 
(1987, 1990)* The approach U> structural robustness for maximum likelihood estimation of ability 
outlined in Section 3 is due to Junker (1991b). The more general statistical considerations of 
Sections 4 and 5 represents the joint work of Clarke and Junker (1991), Finally the work on 
two new indices of unidimei sionality described in Sections 6 and 7 represent ongoing joint work of 
Junker and Stout, Owing to the "survey" nature of this paper, once Section 2 is read the remaining 
sections may be read in any order* 
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2 Essential independence 

A successful approach to identifying unidimensional latent structure outside the strict LI/M frame- 
work has been pursued in the seminal work of Stout (1987) and Stout (1990b), and extended by 
Junker (1988) and Junker (1991b). The main idea, which borrows from both the "large sample 
theory'' tradition in mathematical statistics and the "factor analysis" tradition in psychometrics, 
is that of essential independence. 

For any (infinite) sequence of items £ * (^i,^2»^3, . • •) (dichotomous or polytomous), we 
define bounded item scores to be functions Aj(Xj) such that 3Af < oo such that < M V>. 

In the special case that each Xj takes on ordered, discrete values £,i < < . . ., we will call a 
bounded item score an ordered item score if moreover Aj({jk) < Vfc (in the dichotomous 

case, Aj{0) < Aj{l), for example). Also, we define a bounded test score to be the average of the 
first J bounded item scores Aj = ^ H/ = i Aj(Xj). 

Definition 2.1 The infinite item sequence X_ is essentially independent (EI) with respect to Q if 
and only if 

lim Vzi(Aj\Q.= 0) = 0 (1) 

J— OO 

for all bounded test scores Aj. 

It can be shown that therefore A j is a consistent estimator of the "true score" Aj(6) = = 0], 

as J — ► oo. In particular, for a sequence of dichotomous items EI implies that the proportion 
correct score Xj consistently estimate values of the the test characteristic curve (TCC) as 
J —> oo. 

Definition 2.2 The item sequence 2L essentially unidimensional, = 1, if and only if 

EI: 2C w EI with respect to a unidimensional 0; and 

LAD 1 : for every set of ordered item scores, the u true score n ~Aj{6) is nondecreasing in 
0. 

If no such unidimensional O exists, we write d£ > 1- 

Ail of the theoretical results in this paper apply to polytomously-scored— and in some cases 
continuously-scored — items, but for simplicity we will focus mostly on the familiar dichotomous 
case in which Xj — 0 or 1. _ 

When an item sequence has ds = 1 the true score ~Aj(0) may be estimated with Aj and then 
inverted to produce estimates 8 = Aj (Aj) of 6 itself; in particular, Tj (Xj) -> B as J — - oo. The 
notion of essential unidimensionality, dg = 1, should be contrasted with strict unidimensionality , 
di = 1, under which both local independence (LI) and monotone (M) increasing ICC's are required. 
In particular, any model satisfying LI also satisfies EI. See Stout (1990b) and Junker (1991b) for 
details. 

1 "LAD" standi for locally asymptotically discriminating. There is a technical detail about the possibility of "flat" 
true scores which need not concern us here. See Junker (1991b) or Stout (1990b) for detiils. 
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The d£ = 1 condition is built out of the good ^estimation properties of total test scores under 
strict unidimensionality. The EI condition is explicitly designed to characterize unidimensional 
behavior — the "driving* of Xj by a single dominant trait 0— under conditions other than local 
independence. Nandakumar and Stout (Stout, 1987; Nandakumar, 1987, 1989, 1991a, 1991b) have 
investigated the practical assessment of essential unidimensionality in a variety of finite-length tests 
with minor violations of the di = 1 representation. 

3 Structural robustness and maximum likelihood 

Stout (1990b)'s definition of essential independence covers a broad range of situations in which 
one might wish to assert that the fundamental behavior of the data is unidimensional, although 
"nuisance traits" prevent strict di = 1 from holding. The influences of these nuisance traits are 
sufficiently small that it is tempting to use strictly unidimensional estimation techniques to estimate 
the dominant (and interesting) trait, rather than to take an explicitly multitrait approach. For 
example, maximum likelihood estimation of the dominant or target 9 can be examined in this light* 
In the binary (dichotomous) case, in which Xj takes the value C or 1 depending on the examinee's 
answer to the j th item, the d^ = 1 likelihood is 

j 

P[Kj = xj | 0 = 9) = J] - W)) 1 -*', '2) 

with monotone item characteristic curves (ICC's) Pj{6) — P[Xj = 1 | 0 = $]. If the log-likelihood 
is sufficiently smooth, the MLE must solve the likelihood equation 

0 = L'AOj) = j2 X i( § j)l X > - W))' (3) 

where Xj(6) = log Pj(0)/{l~Pj(6)) It should be noted that the "minimum discrimination" condition 
LAD in Definition 2.2 plays a crucial role in the rigorous proof of consistency of Oj; LAD guarantees 
that the average information function 

= 7 X>}(W?(*)><#>0, (4) 
as J — oo. See Junker (1991b) for details. 

Theorem 3.1 Let be a dichotomous item sequence with sufficiently smooth ICC's satisfying El 
and (4). Then there exists a sequence {0j : J > Jq) of roots of (3) such that 

lira P[|0j-0|<£|0 = 0]=1, (5) 
for every c > 0 (i.e., 6j £ 6, given 0 = $, as J -> oo). 
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Indeed, (3) may be expanded as 

= 7 £ a}(»)K - p,(*)] + (i - #) ( j £ A7(#)[Xi - />,(*)] - j £ a; 

j=i I ;=i J =1 J 

- #) 2 7^'(f ) 

= o p (l) -(«-*) {7j(*) + o p (l) + 0(< - 0)} . 
Note for example that, by Definition 2.1, 

as J — oo. The other terms are handled similarly. 

In general there are problems with multiple roots of the likelihood equation (3) when the problem 
is set up in this manner; moreover it may be argued that a consistency result for a theoretical solution 
to (3) is of no value in practice. Fortunately, the same method of proof shows that the familiar 
practice of approximating a root of (3) by Newton's method still leads to consistent estimators, 
under EI. 

Theorem 3,2 Suppose the assumptions of Theorem 8.1 hold, and let 8j be any sequence of con- 
sistent estimates of 0, given 0 = 8. Then the Newton's method improvement, 

r, = h - 

J wo 

is also consistent for 8. 

Following the remarks after Definition 2.2, an obvious candidate for the initial guess in Theorem 3.2 

In the usual LI ability estimation theory, we expect that the sequence 8j will be asymptotically 
normal and efficient, 

ji(h-6)~AN{OM7j{0)), (6) 

as J — ► oo, where lj(6) is the traditional test information function introduced in (4). A result like 
(6) identifying the standard error of 0j is needed to do statistical inference using 9j — or indeed, 
merely to know how well to trust 6j as an estimator of 6 for particular fixed J that arise in 
applications. However, (6) may fail in the essentially unidimensional case in two interesting ways: 
it may be that asymptotic normality holds but the asymptotic variance is no longer or it 

may be that asymptotic normality fails completely. 
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From (3) and the above results, we see that the consistency and asymptotic distribution of 9j 
is tied up with the behavior of the centered weighted averages 

\tj{9) = Aj-Aj(0) (7) 



J 



= 7l>i[*i-W)). (8) 



with a } s A^(0), where the dependence of a, on 6 does not matter since 0 is fixed. Once again, let 
a 2 j{9) = Varfol*) 

= 72 E a > p i( W ~ P >W + 12 E E o.^Cov(X„ ^ 1 9 = 0), 

and let 



<w> - 7 EE a;-(^)aj(5)cov o) 

Theorem 3.3 Suppose that the assumptions of Theorem $.1 hold for the item sequence X_ and the 
latent trait 0. Also suppose f given 0 ss 0, Mat in 

1 Pj-Ij(5)]-^(0 ? 1). (10) 



Finally, suppose R(J) is a function for which R 2 (J)Cj($)/J remains bounded. Then, 



Moreover, if 6j is any estimator for which R(J)(9j - 8) is bounded in probability, 6j from Theo- 
rem 3.2 is also asymptotically normal with the same asymptotic variance. 

Theorems 3.1, 3.2 and 3.3 are structural robustness results: a method of estimating ability 
developed under rf^ — 1 is robust to violations of di = 1 within the d£ = 1 framework, in the sense 
that it still converges to 8 as test length J grows. However this robustness of consistency for 0j does 
not extend to robustness of variability. Nonefficient and non-normal asymptotic error distributions 
for 8j can be expected in many d# - 1 situations; the deviation from the "efficient" LI- based 
standard error can be expressed in terms of the "index" Cj(9). Further details, and extensions to 
the polytomous case, may be found in Junker (1991b). 

General conditions for asymptotic normality for dependent sums have been established by 
Dvoretzky (1972); particular cases that seem useful include mixing CLT's (Iosifesca and Theodor- 
escu, 1969) and methods for associated random variables (Cox and Grimmett, 1984; Newman and 
Wright, 1982). Once (10) is deemed acceptable, the asymptotic behavior of 8j is determined by 
Cj(0). When Cj{0) is near zero, we can expect the items to behave as though LI were true; when 
Cj(9) is much larger, we should expect item behavior which can be effectively analyzed only with 
a multidimensional model. We shall return to Cj(8) and (9) in Section 7 below. 
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4 The best local independence model when LI fails 

One of the lessons of Section 3 was that it is possible to continue using di - 1 methods when in fact 
di > 1 as long as d£ - 1 holds with respect to the trait you want to measure. The construction 
there makes a particular choice for the "unidimensional IRF's" in a fictitious version of the LI 
likelihood, namely the "marginal'' IRF's in (4) suggested by Stout (1990b). Is this a good choice? 
Can it be achieved in practice? 

It is valuable to set out general forms of the models we are considering. The "ideal" random- 
effects LI- based model for item response (and other) data in psychological measurement is a mixture 
model 

m(xj) = Jrj(xj\$)dF(e) (11) 
where F is the distribution of 0, and rj (xj \ 9) factors as 

rj(xj\9)^{[T J (x 3 \e). (12) 

In dichotomous IRT for example, r 3 {x 3 \9) = tt ; (0)*'(1 - *j{9)) l ~*>< for some set of ICC's Xj(0), 
but the formulation in (11) and (12) works for arbitrary observations X\, . . Xj on each individual. 
The main statistical task is inference about each individual's unobserved 6 from each individual's 
observed sj, based on the particular form of the right-hand side of (12). 

LI models are an attractive and convenient data analysis tool, and are often assumed even 
though it may be agreed that (12) only approximately fits or reflects the mechanisms underlying 
the data. Suppose the correct formulation is 

mfa) = Juj(xj\9)dF(d), (13) 

where the conditional model for Xj given 6 is some dependent vj fa 1 6) whose structure is not 
known in detail. How far could an analysis based on (11) and (12) go? It is useful to first know 
what the "best possible" choice for r j fa \ 6) is. We shall show that the best choice is indeed 
Qj fa |0)> whe re 

9i (*. I *) - J *>J fa \6)dx x ... dxi-x dx i+l . . . dx n ( 14) 

(when the variables are discrete, as in IRT, the multiple integral here is replaced with a multiple 
sum). Recall the Kullback-Leibler distance 

D{uj\\rj) = De(uj\\rj) = J log ^ |g j j± v j fa | 

models that are close in the Kullback-Leibler sense are also close in other more common senses 
such as mean absolute error. 

Proposition 4.1 D$(uj\\ rj) is minimized over rj by taking rj = qj. 
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Indeed, following Aitchison (1975), we note that by (14), 

D 0 (vj\\tj) = ^ ( 9j) + / log ^{|^[|y 9. I ^) ^ 

J 

which is clearly minimized by taking r, = qj in each term of the summation at right. This 
identification is completely general and applies to binary, polytomous, and even continuously- 
scored items. Further details and extensions of this idea may be found in Section 2 of Clarke and 
Junker (1991). 

In the usual binary IRT context, suppose di — 1 is violated and let uj{xj\&) be the true, locally 
dependent likelihood for xj given the dominant or target trait 0. Let rj(xj\6) ~ n/=i *j(9) Xj (l - 
*r,(0)) l ~ Xj be a locally independent likelihood with arbitrary ICC's The above proposition 

shows that i/the desire is to analyze data from vj(xj\0) using a "fictional" Li-based likelihood of 
the form tj(xj^9), then the best choices for Xj(0) in rj are the true marginal ICC's Pj(0) obtained 
from vj via (14). If 6 is the first coordinate of a multidimensional trait vector $i with respect 
to which LI holds using multidimensional response functions ij(£i), then it can be shown that 
obtaining Pj(Q) via (14) is equivalent to 

/ P J (SiM§i\9 l = 9)d$i 

It must be noted that in practice the selection of tj in (12) is itself often subject to uncer- 
tainty, in that the rj( |0) are typically selected from a parametric family r a whose parameters 
Si* • • estimated from (some subset cf) the data. Tsutakawa and Soltys ( 1988) and Albert 

(1991) provide important insights into correctly addressing this issue. On the other hand, there is 
some evidence that this '"best possible" case may approximately be achieved in some large-scale 
educational testing applications, for example. Wang (1986, 1987) has identified that component 
d of 0f = (0i,...,0<j) in a multidimensional compensatory logistic IRT model which is measured 
by a fitted unidimensional logistic IRT model. Wang's "reference composite" i? is essentially the 
first component of that rotation of d\ which produces a principal components analysis of the in* 
formation matrix 7(0f ), and she argues that popular IRT model-fitting programs such as LOGIST 
and BILOG produce stable estimates for item characteristic curves with respect to the reference 
composite. 

5 Structural robustness and posterior distributions 

As we have seen, identifying the "product of marginals" qj(£j \0) as the best LI likelihood to 
use was easiest to accomplish by considering the general statistical models of Section 4. In the 
same way, the asymptotic consistency results of Section 3 can best be understood and extended by 
considering more general models. Let us continue to use the general notation of Section 4. 
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5.1 Another route to MLE consistency under EI 

By analogy with (3), define 

J 

im = \ogqj{Xj\9) = £log 9j (X#); (15) 

;=i 

and by analogy with the Kullback-Leibler distance, define 

Let us also abbreviate ) = qj(- \9). Lj{6) would be the log-likelihood if LI were true, but we 
are not assuming LI here: i.e., g/(0 may not be the true likelihood function* Finally, for each t, 
define Bg(t) s {r : |r - t\ < 6}. In this general setting we may obtain consistency of the MLE 
under the following assumptions, without making explicit assumptions about the violations of LI. 
Assumption Cl. For each 0 and t ^ 0, there exists c(t) > 0, such that 

Urn P[Dj(e,t)>c(t)\9) = l. 

J— *oo 



Assumption C2. For all / ^ 6 and all f > 0, there exists 6 > 0 such that 

r 



Urn P 



inf 0j(l,r)> 

r€B,(t) 



0 



= 1. 



Assumption C3* There exist c& > 0, such that for all 6 > 0 and A sufficiently large (depending 
on 6), lim inf j^oo P inf| T | >A Dj(d, r)> c& > 1 - 6. 

Under these assumptions we obtain the following proposition ensures consistency of .the MLE, 
and furthermore gives an "asymptotic convexity" which will be useful later: Lj(8) dominates 
Lj{r) as J ~* 00, for all r u away from" 0. The domination will be used in Theorem 5.1 to establish 
asymptotic normality of the posterior ability distribution constructed from the LI likeihood qj, 
even though LI fails. The proof of may be found in Clarke and Junker (1991). Straightforward 
modifications also give consistency of the posterior mode. 



Proposition 5.1 Under Assumptions Cl through CS t for all € > 0 and all 6 > 0, there exists 7 
as 7(c,6) > 0 such that 



lim inf P 
J— 00 



0 



>l-6 



(16) 



and hence the formal MLE8j *^0a$J—*oo (where 19 denotes convergence in vj -probability). 
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Assumptions C1-C3 are what is needed to make the proof work. Ideally we would like Propo- 
sition 5.1 under Stout's EI condition, as given in Definition 2-1. An appropriate generalization of 
£1 is the law of large numbers (LLN) 



= 0 (17) 



for all bounded sequences of functions {aj{Xj) : j = 1,2,...} (for any type of A'/s whatsoever). 
However, the proof of Proposition 5.1 depends on a LLN that holds for sums of log-contrast func- 
tions Dj{B,t) = ^fof(<Xj'|0)/ty(J(i|r)]i whose summands need not be bounded. (In 
polytomous or dichotomous IRT settings, since each item has only finitely many possible responses, 
Dj(B,t) would have bounded summands, so that (17) suffices; see Proposition 5.2 below.) Lem- 
mas 5.1 and 5.2 show precisely what LLN's are needed in general to obtain Assumptions Cl and 
C2. The proofs of the lemmas are straightforward bounding in probability arguments which are 
omitted. 

Lemma 5.1 Suppose 

(a) For each t ^ 9 there exists 0{t) > 0 such that Uminf„_ 00 (l/.7)Z> $/) > 0{t); 

(b) As J - oo, Dj{9,t) - (l/J)D (q J e \\ q{) v -4 0 
Then Assumption Cl holds. 

Lemma 5.2 Suppose that, for all t ^ 9 there exists l t > 0 such that 

(a) V £ > 0 3 6 € (0,6,), such that Iiminfj-H»iiif T€ « l(l )£[Z?j(«,r)|^ > -f; 

(b) V i > 0 3 S € (0, S t ) such that hmj^ P [sup T€fl<(<) \Dj(t, r) - E[Dj{t, r)\ 9] | < {| r] = 1. 
Then Assumption C2 holds. 

Let us specialize these results to a polytomous IRT setting. Recall that each observable variable 
Xj has kj values , . . ., { jkj (the subject makes one of kj responses for each item), with each k } < k Q 
for some fixed k 0 < oo. The LI likelihood is q± (xj \ 9) = f]/=i Qj (*j l#), where 

/=i 

and Yji = l{.v,=^,}. 

Proposition 5.2 Suppose that EI and LAD hold, and that the response curves Pji satisfy 

For each t, 0 < inf < sup Pj,{t) < 1; (18) 

/j/(t) is continuous at each t, uniformly in j and / (19) 

and suppose Assumption C3 holds. Then the "wrong model MLE" 9j is vj -consistent for 9, as 
J — oo. 
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Proof. We will verify the conditions of Lemma 5.1 and Lemma 5.2. It follows from an inequality 
of Csiszar (1975), D ( f\\ g) > (1/4) [/ \f(t) - g(t)\ dt}\ 



> HE 

J si 



T 2 



(20) 



which is bounded away from zero under LAD (consider {a)/} for which = 1, and aji = 0 for all 
/ < fcj). This is (a) of Lemma 5.1. On the other hand, (b) of Lemma 5.1 follows from Definition 2.1 
and (18), since the summands of Dj(8,t) are bounded. 

The continuity condition (a) of Lemma 5.2 follows from (19). (b) of Lemma 5.2 requires that 



Urn P 



sup \Dj(t,r)- E[Dj(t,T)\0]\<e 

reB t {t) 



e 



= i 



for every < and appropriate 6. The expression in absolute values may be written as 



J *, 



) 



which will tend to zero uniformly in r £ Bs{t) by Definition 2.1, (18) and (19). □ 

Assumption C3 may often be verified directly. Consider the case of binary response data, in 
which kj = 2, £,i = 0 and £, 2 = 1, and the response curves are of the three parameter logistic form 

m -<> + (i - ^ tt^f^wy 

Then Dj(6,r) = (1/7) El h(0) - ij(r), where 



t J (T) = X J \0g-^—_ log 



1 - c, 



1 + JSL 

1-Cj 



Hence 



lim -tAr) 

T—XX> * 

lim -tAr) 



0, if Xj = 1, 
oo, if Xj — 0; 

= -logc*'(l-c;) 



l-X. 



and we see that Assumption C3 holds as long as P[Xj = lVj|0] = P[Xj == OVj|0] = 0; this in 
turn follows from Definition 2.1 and (18), which merely requires that the o/s 6/s and Cj's do not 
"wander off" to the edges of their parameter spaces. 
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5.2 Asymptotic posterior normality under EI 

We now turn to the possibility of basing inference for 6 on the formal posterior distribution 

Uqi9l - J) - jr^j&jlTMTXlT' (21) 
where is the prior density on 6. Of course, the true posterior distribution is 

uj(xj\6)u(6) 



w„(0|Sj) = 



/~ fj(u|r)w(r)<fr' 



The point once again is to see whether a "wrong model ana'vsis" based on the LI likelihood qj can 
work when vj is the correct conditional law. Let us make the following regularity assumptions. 
Assumption PN1. Let = E[(d log qj{Xj\9)/d0) 2 \0] and 7j(9) = (l/J)Bf /,(*). We 

assume there exist 0 < e$ < M$ < co such that (9 < 7j(6) < Af$, for all large n. 
Assumption PN2. f d 2 q } {x\0)/dd 2 dx = 0. 

Assumption PN3. "M t ^{xJ) = sup T€Ba#) k? 2 log ?J (:r|r)/dr 2 - 0 2 log 9j (x|0)/d0 2 | is bounded 
uniformly in 1 and j, for small e > 0, ; and for 77j(c,0) = (l/J)T,\ Mi,j(Xj,0), 



lim lim sup £ M j{c , 0) 



0 



s 0. 



Assumption PN4. The prior density uj(t) is positive and continuous throughout a small neigh- 
borhood of 6. 

Theorem 5.1 Assume EI as in (17), and the conclusion of Proposition 5.1. Under the additional 
assumptions PN1 through PX4, for all a < 6, 

I u q {e\Xj)de U -l *(6)-*(a) (22) 

as J -* 00, where oj - {~I'j(£/)}~ 1/2 , and $(■) if i/ie */ie standard normal c.d.J. 

Hence, in contrast to Theorem 3.1, which shows that the asymptotic distribution of the MLE 
is sensiwive to departures from strict unidimensionality, Theorem 5.1 suggests that the asymptotic 
posterior ability distribution cannot "detect" such departures. While this may initially seem to be 
good news, it actually undermines the desirability of basing inference about 6 on a wrong-model 
posterior. We shall return to this point at the end of the section. 

The proof of this result, and extensions to situations in which £1 fails, may be found in Clarke 
and Junker (1991). Chung (1991) has independently produced a proof of this result in the tradi- 
tional, LI- based, dichotomous IRT setting. In both cases, the calculations are modeled after Walker 
(1969). Straightforward modifications give consistency of the posterior mean and higher posterior 
moments. 

The next proposition specializes the result to essentially unidimensional polytomous IRT models* 
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Proposition 5.3 Suppose, in addition to the assumptions of Proposition 5.2, that 
d 2 



38* 



log P,i{6) is bounded pointwise in 0, uniformly in j and /. 



(23) 



Then, in the sense of (22), 



3U 



Proof. Assumptions PN2 and PN4 are usually true "by fiat," so only it is only interesting to 
consider Assumption PN1 and Assumption PN3. Proposition 4.1 of Junker (1991b) shows that 
Assumption PN1 holds under LAD and differentiability conditions (the argument is similar to the 
one bounding (20) away from zero). The uniform continuity condition oi' Assumption PN3 focuses 
on a locally uniform bound for 



(24) 



which follows from (23), due to the boundedness of the K^/V □ 



Example 5.1 Stout (1990b) and Junker (1991b) consider binary responses Xi, A2, X3, . . having 
the same response curve P[Xj = 1 j 8) = 0. Suppose that the items are arranged in successive groups 
of g 0 items as X u X 2 , • . X 9o ] -Y fo+ i, X 9q +2> ■ ■ X 2go \ etc., such that different groups of g 0 items 
are independent of one another, given 0, and items within a single group are positively correlated, 
given 0, and with 



Corr(A\,.Yj|0) = 



f c if X x and Xj are in the same group, 



0 if not, 



for some fixed c € (0,1]* This uj is a naive model for a paragraph comprehension test in which 
several paragraphs are presented and g 0 questions are asked for each paragraph. Here, 6 represents 
a trait common to all the items, which we might wish to think of as reading comprehension; and 
the nonzero correlations are induced by nuisance traits, for example, specific knowledge about the 
subject matter of the paragraph at hand. 

EI and LAD hold in this case, and it follows from Proposition 5.2 and Theorem 3.3 that 

v7(0j-0)- tf(0,<7 2 ), 

where 9j = xj, and a 2 = 0(1 -8)[l + c(g 0 -~ 1)] is somewhat inflated over the anticipated asymptotic 
variance 0(1-6) under LL On the other hand, it follows from Proposition 5.3 that 



O-Oj 



JV(0,1). 
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In Example 5.1, the asymptotic distribution of the NILE has an inflated variance, due to the 
departure from strict unidimensionality, but the asymptotic posterior does not. Moreover, careful 
examination of Theorem 3.1 and Theorem 5.1 makes it clear that the asymptotic distribution of 
the MLE is always potentially sensitive to any "local dependence" in the data, even when ds - 1 
(Definition 2,2) holds, while the asymptotic posterior distribution under d% - 1 never is. Clarke and 
Junker (1991) also examine this phenomenon in some > 1 situations. It is widely believed that 
the two paradigms, likelihood-based inference and posterior- based inference, are philosophically 
different but "asymptotically the same", except in bizarre situations. But the perfectly reasonable 
desire to analyze IRT data using unidimensional models that are tolerant of minor violations of 
strict uuidimensionality has lead us into a situation in which the asymptotics come out differently, 
even for w typical" cases. How can we make sense of this? 

On the one hand, the LLbased MLE 0j is really an M-estimator with a particular choice of 
objective function, namely the product of the one-dimensional data marginals of i/j, which we have 
denoted qj. Thus we may interpret the asymptotic distribution of the M-estimator fijasa measure 
of estimation error under i/j without difficulty; in particular we need not a sume that the data 
actually came from qj to arrive at this interpretation. 

On the other hand, our approximation to the Li-based posterior shows that it concentrates at 
the Ll-based M-estimator — cf. equation (22) — but its "asymptotic rate of concentration" is harder 
to interpret: Li-based asymptotic posterior standard errors say how much the Li-based posterior 
is concentrated around the M-estimator, but not how much the Ll-based posterior is concentrated 
around the 0 which "generated" ij. If an LI model really held, then Bayes' rule would allow us to 
interpret the Li-based posterior, and hence its asymptotically normal approximation, in the usual 
sense of updating belief about where 0 was after looking at the data. If LI does not hold, then we 
cannot appeal to Bayes' rule for this interpretation, and the Li-based posterior is interesting only 
because it corresponds to what is done in practice. Perhaps the only justifiable interpretation of 
uj g is a counterfactual: "If LI were true, this is where we would think 0 was," 

Although both MLE and Bayes paradigms lead to consistent estimators when the Li-based like- 
lihood qj is substituted for the true dependent likelihood i/j, correct calculation and interpretation 
of the variability of the estimators depends on a more careful analysis of the stochastic behavior of 
the data-generating mechanism. Detecting situations in which this must be done is the major goal 
of the work reported in Sections Section 6 and Section 7. 

6 A global index of unidimensionality 

Stout (1987) proposes a statistical test of unidimensionality for binary IRT data, which has been 
further investigated by Stout and Nandakumar (1987, 1989, 1991a, 1991b). The test statistic is 
based on a quantity which may be interpreted as an estimate of the measure 



of unidimensionality of IRT data. Note that under di — 1 the covariances are identically zero, so 
that €j s 0. Under d£ = 1, the covariances tend to zero as J grows, and hence £j^0ford£ = l 




i<i<i<J 



ERIC 
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data. If the data is dramatically multidimensional, the covariance will be prediminantly nonzero 
and we expect t j > 0. This measure can be estimated directly with the index 

3 = 4e M 2 )" 1 EE (MA** = *)i 

where X+ is the total score on the whole test, Nk is the number of examinees with total score k 
out of J on the whole test; and the estimate Cov {X%,Xj\X+ = k) is obtained in the usual way 
as (l/WE«=i(*ni - *i)(*nj - with = x "> (the sums extending only over 

examinees in the k th cohort). 

The ideal behavior of this index should be 

«5 as 0ifd E = l; 
ej > 0 if d E > 1. 

Initial study of this index showed that «J wa s greatly inflated in unidimensional cases. The inflation 
could be attributed to either of two causes: some covariances were nonzero because of natural 
random variability in the data; and others were nonzero because, in many strictly unidimensional 
models, Cov (X i ,X j \X + ~ k) < 0 may occur even though Cov (Xi,Xj\Q - 9) = 0 V 6 (see Junker 
(1991a) for a theoretical discussion of this point). Since the absolute values of the covariances are 
summed in calculating the index <j, these latter negative covariances, which are in fuct due to 
unidimensionality, were counted against unidimensionality in the index. 

To remedy the situation, the following four-step construction was formulated: 

1. Perform a principal components factor analysis of the tetrachoric correlation matrix and 
retain the list of second factor loadings, {Xj? : j = 1, . . . , J}- 

2. Cast out individual items X } for which |A ;2 | < M for some fixed cutoff M. 

3. For each k = 0, . . ., 7, obtain covariance estimates Cov {Xi,Xj\X+ = k) for all the item pairs 
left after applying Step 2. (Note: X+ is formed from all the items, but only covariances 
among the items remaining after Step 2 are calculated in each A"+ cohort.) 

(a) If A l2 - X j2 has the same sign as the estimate Cov (A"„ Xj\X+ - k), retain this covariance; 
otherwise cast it out. 

(b) Calculate 

= ( 2 r ee = 

remaining pair* 

where the sum is over all those pair remaining after Steps 2 and 3a. 

4. Calculate the new index 
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A rationale for this construction is most easily seen by contrasting a strictly unidimensional 
test with a test consisting of two strictly unidimensional subtests, say half "math" items and half 
"verbal" items. In the strictly unidimensional case, the first factor of a principal-components factor 
analysis of the tetrachoric correlations will be close to the true ability factor underlying the test, and 
the second factor will pick up only random variation in the data. Thus many of the second factor 
loadings Aj? should be quite small; these items are automatically dropped from the analysis by Step 
2. Many pairs of the remaining items will have Cov = k) < 0, and approximately half 

of these should be dropped because the "random" second factor loadings should satisfy A^ • A>2 > 0 
about half the time. Thus most of the covariances are not included in the calculation in Steps 3b 
and 4, and therefore ij ^ 0 in the unidimensional case. 

In the case of two different, strictly unidimensional subtests, the first factor of a principal* 
components factor analysis of the tetrachoric correlations will be a general factor correlating highly 
with the number-right score. The second factor will be a "contrast" (or bipolar) factor for which 
items in one subtest, say the "math" items, will load positively; and items in the other subtest, 
say "verbal" items, will load negatively. A few items will be cast out in Step 2 again because they 
do not load heavily enough on the contrast factor. Of those remaining, consider separately the 
cases A t2 * Xj 2 > 0 and A^ * Xj2 < 0. If the product is positive, both items probably come from the 
same subtest and we expect Cov (A\, Xj\X+ - k) > 0 (since X+ is summed over both subtests it 
is measuring u 0 m ath + ^verbal"; if the items are both "math", the "verbal" component of AT+ will 
tend to make the covariance positive, and vice-versa). We would like to keep this covariance in the 
calculation for ij and this is what Step 3a does. On the other hand, if the product is negative, 
the items probably come from different subtests and we expect Cov (XiiXj\X+ = k) < 0 (this is 
the non-unidimensional behavior that tests of "conditional association", Holland and Rosenbaum, 
1986, are designed to detect). We would also like to keep this covariance in the sum, and Step 3a 
does this for us too. Thus most of the covariances are included in the calculation in Steps 3b and 
4, and therefore ij > 0 in the non-unidimensional case. 

Preliminary simulation and real-data studies with the index ij are quite promising, as Tables 1 
and 2 show. In Table 1, the first simulation marked "d = T is based upon a two parameter logistic 
model with discriminations a> ~ iV(l,28, (0.8) 2 ), sampled until 0.5 < aj < 3; and difficulties 
bj * JV(~0.12,(0.84) 2 ), sampled until — 3 < 6^ < 3. The simulations marked u d = 2" are based 
on tests consisting of two pure subtests with correlation p$ Xi $ 2 = 0.3 between traits, and item 
parameters generated according to the same distributions as in the d — 1 case, except as noted. 
The simulations marked AS VAB AR and ASVAB AS are generated according to the three parameter 
logistic model, using the fixed item parameter estimates for particular administrations of the Armed 
Services Vocational Aptitude Battery, Arithmetic Reasoning and Auto Shop sections, attributed 
to Bock by Nandakumar (1987). 

The most striking aspect of Table 1 is the marked contrast in the values of ij between the 
one* and two-dimensional cases. This certainly supports the rationale behind the construction of 
€j above. It is also interesting to note the progression of values of the index as the second factor 
loading cutoff value M increases from 0.0 to 0.2. Clearly, in this range, increasing M improves 
the performance of e j in the unidimensional case without degrading its performance on strongly 
two-dimensional data. By increasing M to 0.2, we are able to effectively decrease the propensity 
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Simulated data sets 






M: 


.00 .10 .15 


.20 




J 


N 


100o 


d - 
d = 
d = 
d = 


1 

2, tT tt s-. 0.8 
2, <r a = 0.6 
2, tr„ = 0.4 


40 
20+20 
20+20 
20+20 


2000 
2000 
2000 
2000 


.84 .21 .15 .10 
2.29 2.29 2.22 2.14 
2.68 2.68 2.68 2.68 
2.38 2.38 2.38 2.38 


ASVAB AR (<f s 1) 
ASVAB AS (d = 1) 


30 
25 


2000 
2000 


.72 .49 .20 
.75 .16 .07 


.06 
.05 




Table 1: cj» applied to simulated data sets. 






Real data sets 




M: 


.00 .10 .15 .20 








J 


N 


lOOcj 






ACT F29B (ma 
ACT F29C (ma 


th) 
th) 


40 
40 


2491 
2494 


.94 .55 .25 .07 
.96 .52 .26 .10 






AR 10 (ASVAB) 
AR 12 (ASVAB) 


30 
30 


1984 
1961 


.74 .28 .16 .04 
.74 .23 .17 .11 





Table 2: ij, applied to real data sets. 

for ma!:ing a Type I error without noticably affecting Type II error. 

The ij index has also been applied to some real data sets, with the results in Table 2. The first 
two lines of the table are from the Mathematics section of the ACT (American College Testing) 
Assessment, Forms 29B and 29C. The next two lines are Arithmetic Reasoning sections of the 
ASVAB. 

These preliminary results show that ij is a promising global index of unidimensionality. Clearly 
there is much mere work to be done in understanding the performance of the index through sim- 
ulation experiments and in applying the index to real data sets. It would also be interesting to 
compare ij to the QZ measure of LI model fit developed by Yen (1984). A more finely-tunable 
version of ij, in which the "cutoff" parameter M may take different values depending on the signs 
of A,2 and Aj2, will also be explored in future work. 

7 A local index of unidimensionality 

An alternative to developing a single global index of unidimensionalty is to try to develop an index 
or diagnostic criterion which helps us understand the nature of violations of strict unidimensionality, 
or identifies areas of the "unidimensionaT ability scale in which alt uty estimation based on strictly 
unidimensional assumptions may not succeed. The index Cj($) as described in (9) is such an index. 
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Under strict unidimensionality the asymptotic standard error of the MLE, for example, is 

SE(8j) = >JjVzj(0j\6) 

where 0 is the true ability value for the examinee "generating" the response sequence from which 
9 j is calculated. However we saw in Section 3 that when strict unidimensionality fails, a correction 
using Cj(0) from (9) is required: 



Cj(B) 



Another way to measure the change in accuracy of ability estimation is to consider a corrected 
information function _ 

rm- _ Ij{0)2 
JK ] ij(e) + Cj(ey 

Thus Cj(0), if it could be estimated, would help us to interpret exactly when ability estimation 
based on a uni dimensional model behaves as though the data were strictly unidimensional. Indeed 
there are three interesting cases: 

L When di = 1 holds exactly, Cj(0) = 0 for all 0, and the "corrected" standard error and 
information functions reduce to the familiar traditional forms. More generally if Cj(0) hovers 
near zero over the range of values of 0 of interest, then it would seem reasonable to pursue 
ability estimation assuming that the data strictly satisfies di = 1, 

II. If Cj{0) is clearly distinct from zero, but not large for most values of 0 of interest, it may be 
desirable to continue to use unidimensional ability estimation methods, but use the corrected 
standard error SE m in assessing the accuracy of ability estimation. 

III. If Cj{0) is quite large for many values of 0 of interest, it is probably most desirable to abandon 
unidimensional modeling completely and develop a multidimensional model for the data set. 

In order to estimate Cj(0) and SE m , the following three quantities must be estimated (see (9) 
on p. 6): 

1. Item characteristic curves Pj{0); 

2. Derivatives of item log-odds- ratios A}(0) = P<(0) / '(Pj(0)(l - Pj(0)); 

3. "Local" item covariances Cov Xj\0). 

Estimates of the average test information 7j(0 ), the usual asymptotic MLE standard error 5£, 
Cj(9) itself, and the corrected standard error SE* may be obtained as straightforward combinations 
of the above quantities. 
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In general there are two ways to tackle this problem. One is to explicitly model for the antic- 
ipated dependence in the data. This is the approach of Gibbons, Bock and Hedeker (1989), for 
example. Consider a multidimensional compensatory IRT model with normal ogive item character- 
istic curves (ICC's). An appropriate — and equivalent — reformulation of the problem is to consider 
underlying "propensity variables" Yi.Yi, ■ •■tYj, such that 

Xj - 1 if and only if Y 3 > 7> 

where, Y U Y 2 , . . . ,Yj are independent, N(£i X im B m% 1 - £ *jm) random variables, given the mul- 
tidimensional latent trait §} - B d )\ and i\ ~ N(HJd X d)- The thresholds 7; correspond 
to difficulty parameters, and the coefficients A^ m correspond to discrimination parameters. (This 
is also the formulation of item factor analysis which underlie? the factor analysis of tetrachoric 
correlations in Section 6 above, in which the X jm are the m ih factor loadings). Gibbons, Bock 
and Hedeker (1989) consider a slightly different formulation of the problem, in which 6 *- N(0, 1) 

is unidimensional, and (Y x Yj) ~ N((\\$ y ...,\jO) t £), given 0, for some covariance matrix 

E. Clearly, if the A,, fj and £ could be estimated, estimates of Cj{B), SE and SE* would follow 
naturally from these and the known normal ogive form of the ICC's. However in our early attempts 
to use this model, we have found the parameter estimates to be too unstable, especially for tests 
of more than a handful of items, to be of use. Nevertheless this is an interesting and attractive 
approach which ought to receive more attention in the future. 

A second approach to the problem of estimating ICC's and local item correlations for possibly 
non- unidimensional data may be based on the nonparametric rank regression methods of Ramsay 
(1990). Two important observations underlie Ramsay's approach. The first is that we can sidestep 
the usual identifiability problem for the ability distribution — one aspect of which is that ability 
estimates are only determined up to rank ordering in the usual IRT formulations— by fixing the 
distribution of ability (estimates) in advance and allowing quite general ICC shapes in order fit the 
observed item response distribution. Ramsay's second observation is that very simple ability esti- 
mates, based on number-right scores and similar quantities, are quite adequate as "initial guesses" 
for constructing ICC estimates. This second observation harmonizes nicely with the observation 
of Stout (1900^ that, under essential unidimensionality, ~Pj l (Xj) 0, as well as with the more 
traditional view that when an unrotated principal-components factor analysis of binary items is 
performed, the first factor (corresponding to the largest eigenvalue) is usually strongly related to 
the total test score on the test (whether or Lot the test is unidimensional). 

In our implementation of Ramsay's method, we obtained approximately JV(0, l)-distributed 
ability estimates by inverse- probability transforms of the ranks of examinees' number-right scores. 
Let us call these crude ability estimates h, f 2 , . . . , ts- Also, let w(t) be the standard normal density. 
Then Pj($) can be estimated nonparametrically using the Nardaraya- Watson kernel regression 
formula 

where h > 0 is a "window width" or "bandwidth" tuning parameter, and (x nl ,x„2,. . .,Znj) is the 
observed response pattern of the n th examinee, n = 1, . . . , N. The derivatives Pftd) may be crudely 
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estimated by considering equally-spaced points « lt . . .,$k in the interval [-3,3] and calculating the 
difference quotients 

(More sophisticated kernel estimates of the derivatives can be obtained, but these crude estimates 
were quick and adequate for our initial investigations.) Finally, Ramsay's method was extended to 
calculate the local item covariances according to the formula 

In our work, all quantities were evaluated at K — 32 equally-spaced points si,S2» • • • »*32 in [-3,3], 
with window-width h = 0.3. Calculations were performed in the statistical package "New S" on 
DECstation 3100's. With default memory allocations in S, data sets with up to N = 500 examinees 
and up to approximately J = 50 items could be examined. Work on Cj{9) is still in preliminary 
stages, but we provide some illustrative examples. 

To illustrate the method, let us simulate one- and two-dimensional tests with J = 32 items 
and N = 500 examinees, with compensatory two parameter logistic item parameters as in Table 3 
(examinee abilities in all dimensions are sampled from N(0, 1) as usual). Note that the one- 
dimensional item parameters are the average of the two-dimensional parameters. 

Since this work is in part a replication of Ramsay's method it is interesting to see how well 
the rank regression method recovers ICC's. In Figure 1 we have graphed a few one dimensional 
logistic ICC's (symbol ".") using the parameters on the left in Table 3. Overlaid on these are the 
unidimensional ranK regression ICC estimates from N = 500 simulated examinees taking the one 
dimensional items in Table 3 (symbol "*"). It seems that the rank regression ICC estimates recover 
the original one dimensional ICC's quite wdl. 

On the other hand, consider Figure 2. The ICC's marked V are the marginal ICC's Pj($i) = 
/ (01,02 M02|0i)<#2, where Pj(0i,$ 2 ) are compensatory logistic ICC's using the item parameters 
on the right in Table 3. Overlaid (symbol "*") are the unidimensional rank regression ICC estimates 
from Ramsay's method (again using N = 500 simulated examinees). As expected, the est»mated 
ICC's in Figure 2 do not match the theoretical ICC's nearly as well as in Figure 1. (This assumes 
that B\ is the abittty we intend to measure; in the future we would prefer to compare the rank 
regression ICC estimates with marginal ICC's for Wang's (1986, 1987) "reference composite"). 

To illustrate the summands for our estimate of Cj(8) we may consider Figures 3 and 4, in 
which rank-rep* .ission estimates of the covariances Cov (X„ Xj\8) (symbol ".") and the "weighted" 
covariances A;(0)AJ(0)Cov (X it Xj\0) (symbol "*") are depicted. Since the data for Figure 3 comes 
from a strictly unidimensional model, we know that the theoretical value of Cov (Xi,Xj\$) is zero 
in Figure 3 (which is shown as a horizontal line). The estimated covariances do indeed hover around 
zero (note that the vertical scale typically ranges from about -0.04 to +0.10). 

On the other hand, we expect Cov ( A,-, Xj\9) to be positive in Figure 4, because the data comes 
from a two dimensional model and we are only conditioning on a one-dimensional 6, The estimated 
covariances in Figure 4 do seem to range about twice as far from zero, on average, as the covariance 
estimates for unidimensional data did. The fact that the estimates sometimes dip below zero in 
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One dimensional data parameters 



Two dimensional data parameters 
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1.87 
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1.38 
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-0.089 


0 


18 


1.43 
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0 


26 


1.54 
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27 


1.78 
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0 
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0.537 
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-0.3299 


0 


5 


1.02 


1.09 


0.5738 


-1.2352 


0 


6 


1.02 


0.90 


-0.0441 


0.0903 


0 


7 


0.99 


0.99 


-0.8639 


0.6787 


0 


8 


1.87 


0.53 


-1.3073 


-0.9015 


0 


9 


1.58 


1.19 


0.8938 


-0.2934 


0 


10 


1.02 


1.55 


-0.1434 


0.3596 


0 


11 


1.53 


1.34 


-0.3824 


1.0370 


0 


12 


0.81 


2.14 


-0.7693 


0.1253 


0 


13 


0.62 


1.78 


-0.8339 


0.1558 


0 


14 


1.87 


1.74 


-0.3210 


-0.3886 


0 


15 


1.59 


2.14 


-0.1208 


-0.2263 


0 


16 


1.75 


1.02 


0.0816 


-0.5054 


0 


17 


1.64 


1.71 


-0.3706 


0.1931 


0 


18 


1.93 


0.93 


-1.3920 


0.3943 


0 


19 


1.44 


1.46 


0.4418 


-0.6704 


0 


20 


0.87 


1.15 


0.0551 


-0.2432 


0 


21 


1.05 


l.H 


0.4246 


-0.6180 


0 


22 


2.03 


1.67 


0.6099 


-1.3403 


0 


23 


1.88 


1.09 


-1.3022 


-0.8211 


0 


24 


0.82 


0.65 


0.6171 


-0.5866 


0 


25 


1.87 


1.05 


-0.0511 


-0.4483 


0 


26 


1.71 


1.37 


-0.8606 


-0.2236 


0 


27 


2.14 


1.42 


-0.1327 


-0.3973 


0 


28 


1.26 


1.43 


-0.7932 


-0.3521 


0 


29 


1.29 


0.93 


-0.0478 


0.0741 


0 


30 


1.78 


1.26 


0.5742 


0.5003 


0 


31 


1.32 


0.69 


-0.6269 


0.4628 


0 


32 


0.65 


1.35 


-0.3338 


0.0078 


0 




7 = 


32, 


= 500, d = 


2, p = 0 





Table 3: Item parameters for illustration. 
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Figure 2: One-dimensional ICC's for two-dimensional data. 
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Figure 3: U tridimensional local covariance estimates for one-dimensional data. 
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both Figures 3 and 4 is probably related to the tendency for Cov {Xj,Xj\X+) to be negative in 
typicd IRT data (see Junker, 1991a; as well as the discussion of the ASVAB illustration below). 

Estimates of Cj(9) for the two data sets are compared in Figure 5. Note how much higher Cj{9) 
is for the two-dimensional data set than for the one-dimensional data set. In this case the Cj(0Ys 
are easy to compare, since they are based on data generated from similar models (both use logistic 
ICC's, and the one-dimensional parameters are the averages of the corresponding two-dimensional 
parameters) which differ only in latent space dimensionality. 

The extent to which the unidimensional information and asymptotic MLE standard errors are 
too optimisic for the two-dimensional data set is illustrated in Figure 6. The graph on the left 
in Figure 6 is again Cj{0) for this data set. In the center and rightmost graphs in Figure 6, the 
uncorrected SE and information functions are plotted with the symbol tt ." and the corrected SE m 
and information functions are plotted with The vertical scale for the center graph ranges from 
2.0 to 10.0 and for the right graph from 0.0 to 1.2. 

Let us turn to another illustration. We have simulated N = 500 examinee response strings to 
three parameter logistic items whose parameters were estimated from the Arithmetic Reasoning and 
Auto Shop sections of the Armed Services Vocational Aptitude Battery (these are the same item 
parameters as used for the ASVAB simulations in Table 1 above). Figures 7 and 8 illustrate the 
uncorrected and corrected MLE standard errors for these ASVAB-AR and ASVAB-AS data sets. 
Once again the leftmost graph is our estimate of Cj(6) and the middle and rightmost graphs contrast 
the (estimated) uncorrected SE and informations function (symbol ".") with the (estimated) Cj(6)- 
corrected quantities (symbol "*"). In both figures, most of the "action" in Cj{8) is in the range 
-0.1 to 0.3. The MLE standard errors hover around 2.0, which seems a bit high, but it is worth 
noting that the corrected standard errors are not much different from the uncorrected ones. The 
story is similar for the corrected and uncorrected information functions, which effectively range 
from about 0.0 to 1.0 or so. Thus, as one would hope for unidimensional data, Cj($) did not 
"overcorrect" the unidimensional SE and information estimates. 

The fact that Cj(8) is negative for moderately low values of 9 in Figures 7 and 8 is interesting: 
as observed above in Section 6, Cov (X,-, Xj\X+) tends to be negative for unidimensional data; see 
Junker ( 1991a). The presence of the nonzero guessing parameter tends to make low-ability responses 
independent (without having to condition on 6) and this mak negative values for Cov (Xi,Xj\X+) 
even more likely. (On the other hand, the extreme positive vai _s of Cj(8) near 9 = -3 are probably 
due to poor estimates of AJ(0).) The standard error and information graphs in Figures 7 and 8 
suggest the uncorrected quantities are adequate for measuring variability of MLE ability estimates 
for these items. 

Our last illustration is a simulated paragraph-comprehension data set. The test consists of eight 
5-item testlets (this nice term comes from Wainer and j'jewis, 1990). The item response functions 
were compensatory logistic, with the first five items loading only on 9\ and 02 » the next five items 
loading only on 0\ and 83 , the next five on 0\ and $4 , and s^ on, such that nine latent traits are 
needed to achieve local independence in this model. The discriminations a, in each dimension 
were sampled from JV(1.20,(0.8) 2 ) until 0.5 < a ; < 3 and difficulties 6, in each dimension were 
sampled from #(-0.12, (0.84) 2 ) until -4 < bj < 4. There were no guessing parameters. Recall 
from Example 5.1 that a test constructed in this way will be essentially unidimensional, d.£ - 1, 
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Figure 4: Unidimensional local covariance estimates for two-dimensional data. 
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Figure 5: Comparison of Cj{9) for one- and two-dimensional data. 
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Figure 7: Accuracy of unidimensional ability estimates for simulated ASVAB-AR. 




30 



Structural Robustness in IRT 



29 



with respect to the dominant dimension 6\ . As usual, abilities in each dimension were generated 
to bei.i.d., tf(0,l). 

In Figure 9, rank regression ICC estimates are compared, for a handful of items, with marginal 
VICC's (here 6 X is undoubtedly the ability "intended to be measured," though again a comparison 
with Wang's reference composite may be more appropriate). The rank regression ICC's match the 
marginal ICC's with respect to the dominant dimension B\ quite well. This suggests that for at 
least some d.£ = 1 data sets, ICC's with respect to the dominant dimension can be recovered. 

Figure 10 gives a graph o'. Cj(6) based on these ICC estimates, and compares uncorrected 
(symbol ".") and corrected (symbol "*") measures of the variability of MLE ability estimates 
based on the rank regression ICC's. Here Cj{6) hovers between about 0.10 and 0.25, the standard 
errors hover just below 2.0, and the information functions live mostly between 0.2 and 0.4. Though 
there clearly would be some gain in employing a multidimensional model for this type of data, it 
is debatable whether it would be worthwhile, especially if the desire is to measure the dominant 
dimension only. 

Comparing especially Figures 6, 7, 8 and 10, it appears that Cj{$) is a promising local index 
of unidimensionality. Clearly Cj{9) depends heavily on the local behavior of ICC's with respect 
to the dominant dimension being measured by the test, and especially on item parameters such as 
discrimination and guessing. Much more work needs to be done to understand this sensitivity and 
distinguish it from sensitivity to true multidimensionality in the data. It would also be interesting 
to run parallel studies of Cj{9) and the ij index of Section 6, to see if they detect the same, or 
different, features of multidimensionality in item response data. Ultimately our goal is a prescriptive 
one: do use MLE, don't use MLE, do trust asymptotic normality, etc., depending on the size(s) of 
the indices. The work reported here suggests that such a goal should eventually be achievable. 
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